On the Sample Complexity of Reinforcement Learning with a Generative Model
نویسندگان
چکیده
We consider the problem of learning the optimal action-value function in the discountedreward Markov decision processes (MDPs). We prove a new PAC bound on the samplecomplexity of model-based value iteration algorithm in the presence of a generative model of the MDP, which indicates that for an MDP with N state-action pairs and the discount factor γ ∈ [0, 1) only O ( N log(N/δ)/ ( (1− γ)ε )) samples are required to find an ε-optimal estimation of the action-value function with the probability 1 − δ. We also prove a matching lower bound of Θ ( N log(N/δ)/ ( (1−γ)3ε2 )) on the sample complexity of estimating the optimal action-value function by every RL algorithm. To the best of our knowledge, this is the first minimax result on the sample complexity of estimating the optimal (action-)value function in which the upper bound matches the lower bound of RL in terms of N , ε, δ and 1 − γ. Also, both our lower bound and upper bound improve on the state-of-the-art in terms of 1/(1− γ).
منابع مشابه
تأثیر آموزش مبتنی بر الگوی طراحی یادگیری زایشی بر میزان یادگیری دانشجویان رشته پرستاری در درس فیزیولوژی
Introduction: Utilizing traditional educational methods does not meet today’s educational needs; Modern educational systems are enabled with new methods of teaching that enrich the teaching- learning process. The purpose of this study was to evaluate the effect of instruction based generative learning design model on nursing student's Physiology learning. Methods: In this study, the pr...
متن کاملValue-Aware Loss Function for Model-based Reinforcement Learning
We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it. We introdu...
متن کاملValue-Aware Loss Function for Model Learning in Reinforcement Learning
We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, might be an overkill because such a probabilistic loss does not take into account the underlying structure of the decision problem and the RL algorithm tha...
متن کاملModel-Based Value Expansion for Efficient Model-Free Reinforcement Learning
Recent model-free reinforcement learning algorithms have proposed incorporating learned dynamics models as a source of additional data with the intention of reducing sample complexity. Such methods hold the promise of incorporating imagined data coupled with a notion of model uncertainty to accelerate the learning of continuous control tasks. Unfortunately, they rely on heuristics that limit us...
متن کاملTeaching-Learning approach in complexity paradigm
"Teaching-Learning Approach" is a model of interaction between teachers and students in an educational environment and one of the main components of the educational system. This model can be organized and designed on the basis of various opinions and ideas, including philosophical or scientific theories. This research aims to design and explain teaching-learning approach based on the complexity...
متن کامل